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We discuss the application of the hypergraph clustering procedure to the 
construction of phylogenetic graphs in biology. In this case the dimension of 
. a hyperedge will describe the number of sources of genetic diversity. 
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In the present paper we discuss the clustering procedure in the case where 
instead of a single metric we have a family of metrics. In this case we can 
obtain a partially ordered graph of clusters which is not necessarily a tree. 
We discuss a structure of a hypergraph above this graph. We propose two 
definitions of dimension for hyperedges of this hypergraph and show that for 



The clustering procedure describes the construction of a partially ordered tree of 
clusters (or hierarchy) starting from a metric on a set of points [lj. 

In the present paper we investigate the following problem. Assume we have 
instead of a single metric a family of metrics depending on a set of parameters (this 
is a typical situation in applications). We will obtain a family of clusterings. What 
is the structure of this family? Can we describe this family by a single mathematical 
object? We discuss the approach to clustering based on an application of partially 
ordered hypergraphs. 

We start with a pair of examples of hypergraph clustering and then propose a 
general definition. Our definition is based on the following observation: for two 
different clusterings which correspond to the different metrics it may happen that 
some clusters (with respect to the different metrics) coincide as sets. This allows to 
unify the different clustering trees into a single partially ordered graph. Moreover it 
is natural to consider a structure of a hypergraph on this graph where the hyperedges 
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will describe the alternative ways of growth of a cluster with the increase of its 
diameter (with respect to the different metrics). 

We give a general description of this hypergraph and apply it to a discussion 
of multidimensional structures in data. Our motivating example is given by the 
family of different metrics in which combines both the hierarchy and the multi- 
dimensional structure. We propose two definitions of dimension for hyperedges of 
a hypergraph of clusters. Both definitions of dimensions (the A-dimension and the 
B-dimension, see the section 5 below) in the p-adic case reduce to the number of 
p-adic parameters (in particular in this case these dimensions coincide). 

In data analysis trees of clusters are used for classification purposes and describe 
the diversity in data. One of the important applications of clustering is the applica- 
tion to construction of phylogenetic trees using the analysis of genomic sequences. 
The procedure of hypergraph clustering discussed in the present paper allows to 
describe the situation when we have several sources of diversity. In particular the 
dimension of hyperedges describes the number of the sources of diversity for the 
corresponding data. In bioinformatics this might be helpful in the situation where 
the analysis of the different parts of a genome generates different phylogenetic trees 
(in particular, for the discussion of a "forest of life" instead of a "tree of life" [21 E]). 
This behavior is typical for the cases of reticulate evolution (in particular, hybridiza- 
tion and horizontal gene transfer), where instead of phylogenetic trees one has to 
consider phylogenetic networks. 

An example of hypergraph of clusters for clustering with respect to a pair of 
metrics was discussed in [1]. A family of multidimensional ultrametrics on was 
investigated in [5] in relation to multidimensional p-adic wavelets with matrix dila- 
tions. Analysis in general locally compact ultrametric spaces and wavelets on these 
spaces were discussed in [B]. For a review of ultrametric mathematical physics see 

The exposition of the present paper is as follows. 

In section 2 we discuss two simple examples of hypergraph clustering. 

In section 3 we discuss hypergraph clustering for multidimensional p-adic spaces. 

In section 4 we give general definitions of hypergraph of balls and of dimensions 
of hyperedges for a general ultrametric space with a family of ultrametrics. 

In section 5 we discuss applications of hypergraph clustering and dimensions of 
hyperedges for phylogenetic graphs. 

In section 6 (Appendix) we recall the clustering procedure and the construction 
of duality between trees and ultrametric spaces. 

2 Hypergraph clustering: examples 

Hypergraphs. In the present section we recall the definition of a hypergraph, and 
consider the two simplest examples of the hypergraph clustering procedure. 

A hypergraph is a set T with a selected system of finite sets E consisting of subsets 
containing two or more elements of V. The elements of V are called hypergraph 
vertices, the sets in E are called hypergraph edges. 
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If all the edges in E are of cardinality two, then the hypergraph is a graph. 

The direct product of two graphs (T\, E\) and (T 2 , E 2 ) is a hypergraph with the 
set of vertices 1^ x T 2 and with the edges of orders 2 and 4 of the following forms. 
Let the first and second graphs contain the respective edges (Ai,Bi) and (A 2 ,B 2 ). 
With this pair of edges, we associate four 2-edges of the product hypergraph that 
are the rows and columns of the 2x2 matrix 



A 1 x A 2 


A x x B 2 


B 1 x A 2 


B x x B 2 



The set of all entries of this matrix is a 4-edge. We define the set of the product 
hypergraph edges using this procedure: the 2-edges are products of the vertices of 
one graph by the edges of the other graph, and the 4-edges are products of the edges 
of the multiplied graphs. 

In general, the direct product of the two hypergraphs (Ti,Ei) and (T 2 ,E 2 ) is a 
hypergraph with the set of vertices Ti x T 2 and the set of edges 

r x x E 2 |J E x x T 2 |J E x x E 2 . 

Hypergraph clustering. Before the introduction of a general definition of hy- 
pergraph clustering we consider several examples. The general idea of our approach 
is that the higher order edges are related to cycles in the union of the clustering 
trees. These cycles describe the different histories of growth of the cluster generated 
by an increase of the diameter of this cluster with respect to the different metrics. 

Example 1. Let us consider the case of a set of three points A, B, C in the two- 
dimensional real plane R 2 with the standard metric. The parameters defining the 
metric are the coordinates of the points in the plane. 

Assume that the set of clusters (vertices of the cluster tree) contains the clusters 
A, B, C, AB, ABCE, and the edges of the tree join the vertices in accordance with 
the growth of the clusters - the cluster tree contains the edges 

(A,AB), (B, AB), (AB, ABC), (C,ABC). 

This defines the tree Ai of clusters. 

Let us consider the variation of the metric (motion of the points in the plane 
IR 2 ), which replaces the above cluster set with the set of clusters A, B, C , AC, ABC 
with the corresponding edges 

(A, AC), (C,AC), {AC, ABC), (B, ABC). 

This defines the tree £>i of clusters. 

We define the multidimensional (or hypergraph) clustering in the following way. 
The set of vertices and 2-edges of the hypergraph C\ under discussion is given by the 

1 where we denote by ABC the cluster containing A, B and C 
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union of the trees of clusters Ai and B\ defined above (where we identify the clusters 
which coincide as sets). Namely the vertex set of the hypergraph C\ contains the 
clusters 

A, B, C, AB, AC, ABC 
and the set of 2-edges (two-point edges) of C\ has the form 

(A, AB), {B, AB), {AB, ABC), {C, ABC), {A, AC), {C, AC), {AC, ABC), {B, ABC). 

The hypergraph C\ also contains the 3-edges 

{B,AB,ABC), {C, AC, ABC) 

and the 4-edge {A, AC, AB, ABC). 

The partial order of vertices is given by the inclusion of clusters. This finishes 
the definition of C\. The set of the points A, B, C can be called the border of the 
hypergraph C\ (it is the border of both trees A\ and B\ of clusters). 

Schematically (see also the next example) the structure of C\ is described by the 
table 



A 


AC 


C 


AB 


ABC 




B 







where the matrix elements are vertices of C\, 2-edges connect all the neighbor vertices 
in the table and the pairs {C, ABC), {B, ABC). 

The edges of the hypergraph C\ describe the growth of clusters starting from 
some vertex. The higher-order edges correspond to cycles in the graph that is the 
union of the clustering trees A\ and B\. 

Namely the 4-edge {A, AC, AB, ABC) describes the following situation. If we 
start from the vertex A we can form the two clusters AB and AC in the trees A\ 
and £>i correspondingly which contain A. These clusters are related to clusterings 
with respect to the two different metrics. Then, the cluster in Ai which contains the 
cluster AB is the cluster ABC, and the cluster in B\ which contains BC is again 
the cluster ABC. 

Example 2. Let us consider the case of a set of four points A, B, C , D which 
are located in the plane M 2 at the vertices of some quadrangle. In this quadrangle, 
using the clustering with respect to the plane metric, we select the clusters 

A, B, C, D, AB, CD, ABCD. (1) 

The set of 2-edges contains the edges 

{A, AB), {B, AB), {C, CD), {D, CD), {AB, ABCD), {CD, ABCD). (2) 

This defines the tree A 2 of clusters. 
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Let us consider a deformation of the mentioned quadrangle (for example, dilation 
in some direction in the plane M 2 ) under which the metric will be transformed to 
the metric which defines the cluster tree B 2 which contains the vertices 

A, B, C, D, AC, BD, ABCD (3) 

and the 2-edges 

(A, AC), {C, AC), {B, BD), (D, BD), {AC, ABCD), {BD, ABCD). (4) 

Using the trees A 2 and B 2 of clusters, we construct the hypergraph C 2 which 
contains the unions of the vertex sets and the 2-edges sets in the described trees and 
also the four 4-edges 

{A, AB, AC, ABCD), {B, AB, BD, ABCD), {C, AC, CD, ABCD), {D, BD, CD, ABCD). 
Such a hypergraph can be represented schematically by the table 



A 


AC 


C 


AB 


ABCD 


CD 


B 


BD 


D 



The matrix entries are the hypergraph vertices, the 2-edges join the neighboring 
vertices (in the horizontal and vertical directions), and the 4-edges correspond to 
the small 2x2 squares containing the matrix corners and the cluster ABCD. 

As in the previous example, the 4-edges describe the histories of the growth of 
one-point clusters with respect to the different clustering trees. 

Product structure in the hypergraph clustering. Let us show that the hyper- 
graph C 2 described in Example 2 above can be put in the form of the product of 
two trees of clusters. This product structure reflects the intrinsic multidimensional 
structure of the data. 

Let us consider the two trees 71 and % which are the trees of clusters in the 
different spaces. The tree 7[ contains the vertices (clusters) x±, y±, x±y 1 and the 
edges {xi, Xij/i), (yi, Xiyi). The tree % contains the vertices x 2 , y 2 , x 2 y 2 and the 
edges {x 2 , x 2 y 2 ), {y 2 , x 2 y 2 ). 

Let us put the hypergraph C 2 in the form of the product of the trees 7i x T 2 . 
The vertices A, B, C, D of C in this representation will take the form of products 
of vertices in 71, T 2 



A 


C 




X\ X x 2 


Xi 


x y 2 


B 


D 




Vi x x 2 


yi 


x y 2 



The other vertices (clusters) of the hypergraph C 2 are unions of the above vertices, 
for example, 

AB = {x 1 x x 2 ,yi x x 2 } = {x 1 ,y 1 } x x 2 , 

AC = {xi x x 2 ,x x x y 2 } = xi x {x 2 ,y 2 }. 

Here we use the notation AB = {A, B} for the cluster which is the union of 
vertices A and B (we recall that the notation (•, •) is used for edges). 
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The 2-edges of the hypergraph C 2 correspond to edges of one of the trees 7[, T 2 
multiplied by vertices of the other tree. For example, the edge (A, AB) is 

(x\ x x 2 , {xi x x 2 , yi x x 2 }) = (xx,xxyx) x x 2 . 

The 4-edges of the hypergraph are the products of 2-edges of 7i, T 2 - In particular 



A 


AC 


AB 


ABCD 



X\ X x 2 


{x! x x 2 ,x% x y 2 } 


{x x x x 2 ,y 1 x x 2 } 


{xi x x 2 ,yx x x 2 ,Xi x ?/ 2 , 2/i x 2/2} 



xiVi 



x 



x 2 



x 2 y 2 



The representation of the hypergraph C 2 by the table can be given in the form 
of the product of the corresponding representations for trees 7i, T 2 



A 


AC 


C 


AB 


ABCD 


CD 


B 


BD 


D 



X\ X x 2 


{xi x x 2 ,xi x y 2 } 


x\ x y 2 


{x\ X £2,2/1 x X 2 } 


{xi X x 2 ,2/i x X 2 ,X 1 X 2/2,2/1 x 7/ 2 } 


{^l x 2/2,2/1 x y 2 } 


2/1 x x 2 


{yi x a?2, 2/1 x 2/2} 


2/1 x 2/2 



£1 



2^1 2/1 



2/1 



;r 2 



x 2 y 2 



IJ2 



This representation reflects the intrinsic two-dimensional structure of the hypergraph 

c 2 . 



3 £>-Adic case 

Multidimensional p-adic metric. One of the main examples of hypergraphs of 
clusters is related to the geometry of balls in multidimensional p-adic spaces. The 
standard multidimensional ultrametric in has the form 

d(x,y) = max i=li ... )(i (|a;i - yi\ p ), x = (x 1 , . . .,x d ),y= (y x , . . .,y d ). 

In paper [5] the following multidimensional deformed metric in was considered 

d qi ,..., qd (x, y) = max i= i i ... id (g l |^ - yi\ p ), p~ x < q { < 1. (5) 

The unit ball with respect to the metric d(-, •) 

Z d p = {x E Z d p : \xi\ p < 1, x = (xi, ...,x d )} 
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and the dilations p k Zp, k G Z of this ball are balls with respect to all ultrametrics (J5J) 
(for all possible choices of the parameters q^. Therefore we can apply the approach 
of the previous section and consider the hypergraph of clusters (balls) in with 
respect to some family of metrics of the form (JSJ). 

Let us describe the tree of balls for metric (jSJ). Assume that for the metric d gi gd 
the parameters satisfy the condition p^ 1 < qi < • ■ ■ < q^ < 1. Then the set of all 
intermediate d qi) ,„ >qd -b&Yls between pZp 1 and Z d v is given by the sequence of balls 

B a = Z p x • • • x Z p x pZ p x • • • x pZ p , (6) 

with a components Z p and d — a components pZ p , a — 0, ... ,d. 

This sequence of balls is related to a complete flag over the field ¥ p with p 
elements, where we consider the natural correspondence between the a-dimensional 
spaces over W p and B a jplA. 

Recall that a flag is an increasing sequence of subspaces of a finite-dimensional 
vector space. A flag in the space of dimension d is complete if it contains spaces of 
all dimensions 0,1, ... ,d. 

Analogously, if we consider the metric d qij „_ >qd where some of the parameters qi 
coincide, we obtain a sequence of balls between pZ d and Z d related to an incomplete 
(partial) flag over ¥ p . 

We consider also a generalization of the metric (jSJ) , given by 

s(x,y) = d qiy .. m (Ax, Ay), (7) 

where d qi ^.. m is given by (jSJ) and A is a matrix with matrix elements in Z p and 
|det A\ p = 1 (i.e. a matrix of linear isometry with respect to the metric d = di i i r .. ) i). 

For a metric from the family ([7]) the sequence of balls between pZ d and Z d 
(obtained by a linear transformation of fl5])) will be related to an arbitrary flag over 
the finite field F p . The set of all balls for the metric fl7]) will be given by translations 
and dilations by degrees of p of the described sequence of balls between pZ d and Zp. 

Hypergraph of balls. Let us fix some family s of ultrametrics of the above form 
and consider the hypergraph C(Q P , s), where the vertices are balls (with respect to 
some of the ultrametrics s G s), 2-edges connect the two s-balls (with respect to 
the same metric s) which are embedded without intermediate s-balls. Since p k Z d v , 
k G Z are balls with respect to all the ultrametrics described above, one can take the 
union of the trees T(Q p , s) of s-balls in for different s G s, where we identify the 
vertices in the different T(Q p , s) (s-balls for the different s) which coincide as sets. 
This gives the sets of vertices and 2-edges of C(Q p , s). The set of vertices possesses 
the natural partial order given by inclusion of balls. 

Let the family s of metrics be sufficiently large, say it will contain the metrics 
Si with the parameters p^ 1 < q^ < ■ ■ ■ < q. ld < 1, where for fixed i the indexes {ij}, 
j = 1, . . . , d constitute a permutation of {1, . . . , d}, and the family s contains the 
metrics corresponding to all possible permutations of {1, . . . , d}. 

Hyperedges (edges of higher order) of C(Q p , s) are constructed as follows. One 
of the hyperedges in C(Q P , s), which we denote by is given by the union (for all 
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s G s) of the sets of s-balls lying between pZ^ and Z^ (including pL^ and Z^). This 
hyperedge possesses the structure of a partially ordered graph described above. 

Smaller hyperedges £ C can be introduced as follows. Let us fix a subfamily 
r C s of ultrametrics on Qp 1 . Let us fix some r-ball / G V d (i.e. I is an s-ball with 
respect to all s G r), which is strictly less than 7Lt 



'in particular, / D pZ£ 



5J). Let J 

be a smallest r-ball in T>& which is strictly greater than / (since Z^ is an r-ball, the 
ball J does exist, the uniqueness of J follows from the ultrametricity of s G r). We 
define £ as a family {K : I C K C J} of s-balls for s G r (i.e. any .K" is an s-ball 
for some s G r). In particular, J = min(£), J = max(£). 

Other hyperedges in C(Qf, s) are given by translations and dilations of the hy- 



peredges £ considered as described above finite sets of balls in 



Compatible families of ultrametrics. We say that the family s of ultrametrics 
on is compatible, if for any two balls, an s-ball I and an r-ball J, s,r G s, the 
intersection I f] J is a ball with respect to some ultrametric t G s. 

The property of compatibility is not satisfied automatically for an arbitrary 
family r of ultrametrics. As we discussed above, ultrametrics on are related 
to flags over the finite field ¥ p . For a family r of flags the intersection of some spaces 
from the different flags in r might not be a space from some flag in r. 

Embedding of hypergraphs of clusters into p-adic hypergraphs of balls. 

Let us show that the hypergraphs discussed in the previous section can be embedded 
into a hypergraph associated with a family of multidimensional p-adic metrics. Let 
us consider the quadruple of points in Q2 



A 


C 




(0,0) 


(0,1) 


B 


D 




(1,0) 


(1,1) 



and perform the clustering procedure with respect to the pair of metrics in Q2 of 
the form di >q (-, •), ^,i(", "), V 2 < q < 1- 

It is easy to see that the metric d q> i will generate the tree A2 of clusters with the 
set of clusters (p]) and the set of edges analogously, the metric di >q will generate 
the tree £> 2 of clusters with the set of clusters (jSJ) and the set of edges fll]). 

Therefore clustering with respect to this pair of metrics generates the hypergraph 
C2 described in Example 2 in the previous section. The product structure of the 
hypergraph C2 described above obtains in this way the natural interpretation of a 
2-dimensional structure of Q|. 

The hypergraph C 2 possesses the natural embedding into the hypergraph of clus- 
ters in Q| with respect to the pair of metrics d q> i, di <q . The correspondence between 
the minimal vertices in C 2 and balls in Q| is given by 



A 


C 


B 


D 



I — V 



(0,0) 


(0,1) 


(1,0) 


(1,1) 



2Z\ 



(2Z 2 ,2Z 2 ) 


(2Z 2 , 1 + 2Z 2 ) 


(1 + 2Z 2 ,2Z 2 ) 


(1 + 2Z 2 ,1 + 2Z 2 ) 



The balls correspondent to non-minimal vertices in C 2 are constructed as the corre- 
sponding unions of the above balls. 
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Analogously, if we restrict the hypergraph clustering procedure related to the 
pair of metrics d q i, <i lj(J to the set of the three points 



A 


C 




(0,0) 


(0,1) 


B 






(1,0) 





we will get the hypergraph C\ described in the Example 1. 

4 Hypergraph of balls for general ultrametric spaces 

Hypergraph of balls. In the present section we generalize the approach of the 
previous section to the case of general locally compact ultrametric spaces. 

Let A be a locally compact ultrametric space with some family of ultrametrics 
s defined on A. Moreover, let, for any pair of metrics s, r G s, any s-ball be a 
finite union of r-balls. In particular all metrics in s define the same topology on X. 
We call X a multidimensional ultrametric space. An example of a multidimensional 
ultrametric space is given by the space Qt with the family (jSJ) of metrics considered 
in the previous section. 

We define the partially ordered hypergraph C(X, s) in a way similar to the one 
we used for the p-adic case. The hypergraph C(X,s) as a graph is a union of the 
trees T(X, s) of s— balls, s G s. Namely the set of vertices of C(X, s) is the union of 
the sets of s-balls, s G s, edges connect s-balls (with the same s) nested without 
intermediates. The partial order is by the inclusion of subsets in X. If some s-ball 
coincides with some r-ball as a set, they define the same vertex in C(X, s). 

The family s of ultrametrics on X is compatible, if for any two balls, an s-ball 
/ and an r-ball J, s, r G s, the intersection I f]J is a ball with respect to some 
ultrametric tGs. 

Hyperedges £ in C(X, s) are introduced as follows. Let us fix a subfamily r C s 
of ultrametrics on X. Let us fix some r-ball / (i.e. I is an s-ball with respect to all 
s G r). Let J be a smallest r-ball which is strictly greater than I. We define 8 as 
a family {K : I C K C J} of s-balls for s G r (i.e. any of K is an s-ball for some 
s G r). In particular J, J G £. 

Let us note that for an r-ball / the minimal r-ball J, J D I does not necessarily 
exist (such a ball always exists for ultrametric spaces containing a finite number of 
points, if I does not coincide with the whole space). If such an r-ball J exists, it is 
uniquely defined. 

The introduced hyperedges possess the natural partial order by the inclusion of 
sets of balls. 

Dimension of an hyperedge. For the p-adic hypergraphs of balls considered 
in the previous section we have a natural definition of dimension. In this case the 
dimension of a hyperedge is the number of p-adic parameters which one can use for 
the description of this hyperedge. Let us discuss a notion of dimension which is 
applicable for general hypergraphs of balls in multidimensional ultrametric spaces. 

Let A be a multidimensional ultrametric space with a family s of ultrametrics. 
Let us consider an r-hyperedge £ G C(X, s), r C s, with the minimal r-ball / and 
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the maximal r-ball J. There are two properties of p-adic hyperedges which one can 
generalize for the general case: 

A) The length of a maximal sequence of nested balls between the minimal and 
the maximal balls in an hyperedge; 

B) The number of maximal subballs in J (with respect to metrics r G r) which 
contain the ball /. Here the different maximal subballs in J will be balls with respect 
to the different metrics r G r. 

This observation implies the following definition. 

Definition 1 Let £ be a r-hyperedge in C(X,s), r C s, with the minimal r-ball I 
and the maximal r-ball J. 

The A-dimension of the hyperedge £ is the maximum of the lengths of increasing 
paths in £ from I to J ( with respect to the partial order in £)Q 

The B-dimension of the hyperedge £ is the number of balls J^, where I C C J 
and Jk is a maximal subball of J with respect to some metric r G r. 

Example. Let us consider the space with the family s of metrics (JHJ), which 
is sufficiently large in the sense described in section 3. In this case we have the 
maximal (with respect to the partial order on hyperedges) s-hyperedge T>d with the 
minimal s-ball pZ^ and the maximal s-ball Z^. 

Both A-dimension and B-dimension of this hyperedge will be equal to d. There- 
fore these dimensions will coincide with the number of p-adic coordinates in Q~. 

For a hypergraph of balls related to a general multidimensional ultrametric space 
X with a family of metrics s, different maximal hyperedges may have different dimen- 
sions, and it is possible that the A-dimension and the B-dimension of a hyperedge 
may be different. 

Embeddings of hypergraphs of balls. Let X be a (locally compact) multidi- 
mensional ultrametric space with a family s of ultrametrics. Let the same conditions 
hold for the space Y and the family r of ultrametrics. We consider the corresponding 
hypergraphs C(X, s), C(Y, r) of balls. 

We assume that there exists a one to one correspondence between the set s of 
ultrametrics on X and some subset of the set r of ultrametrics on Y. With this one 
to one correspondence we will use the notation s C r. We consider the embedding 
of the above multidimensional ultrametric spaces as the injective map % : X — > Y, 
for which any s-ball J in X maps to a subset of an s-ball J in Y with the same 
diameter and moreover the diameters of / and the image of / in J coincide. The 
embedding defined in this way is an s-isometry, i.e. an s-isometry with respect to 
all s G s. 

At the end of the previous section we have discussed the example of embedding 
of multidimensional ultrametric spaces and the corresponding embedding of trees 
and hypergraphs of balls. In a general case, it might happen that the corresponding 
map at the level of trees and hypergraphs of balls does not exist. Let us consider 
the embedding i : X — > Y of ultrametric spaces and let T(X,s), T(Y,s) be the 

2 The length of a path in a graph is the number of edges in this path. 
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corresponding trees of balls. Let J, J, I C J be a pair of balls in X nested without 
intermediates (i.e. the corresponding vertices in T(X,s) are connected by edge). 
Then it is possible that the images £(/) and i(J) are nested with intermediates, i.e. 
there exists a ball K e T(Y, s): C K C In this case the edge JJ can not 

map onto an edge in T(Y, s). 

We say that an ultrametric r on the set X is a small deformation of an ultrametric 
s on X if these ultrametrics generate the same trees of balls, i.e. we have T(X, r) = 
T(X, s). The definition of a small deformation of a family r of ultrametrics on X 
is analogous — a family r of ultrametrics is a small deformation of a family s of 
ultrametrics iff C(X, r) = C(X, s). 

The next problem discusses, whether it is possible to consider, up to a small 
deformation of a family of metrics, a finite multidimensional ultrametric space as a 
subset of Qp with the family (jSj) of ultrametrics. 

Problem. Let X be a finite multidimensional ultrametric space (i.e. containing a 
finite number of points) with a family s of ultrametrics. 

Is it possible to find a small deformation of s such that there exists an embedding 
of X into the multidimensional ultrametric space for some p, d, and a family of 
ultrametrics of the form (131)? 

Let us note here that we do not claim that the hypergraph C(X, s) can be embed- 
ded to the corresponding hypergraph of balls in Q%. If for a space (X, s) the above 
problem possesses a positive solution, we say that the multidimensional ultrametric 
space (X, s) is embeddable. 

5 Discussion 

Given a set X with a family of metrics defined on this set one can construct the cor- 
responding trees of clusters and ultrametric spaces described by these trees. When 
the set X is finite (this condition is satisfied in applications to data analysis) the cor- 
responding ultrametric spaces will possess a natural one to one correspondence with 
X. We obtain a multidimensional ultrametric space X with a family of ultrametrics 
s, and the corresponding set of cluster trees T(X, s), s 6 s. 

Then we can apply to the collection T(X, s) the analysis described in the present 
paper and construct the hypergraph C(X, s) of clusters. This hypergraph is a di- 
rected acyclic graph (a graph with a partial order without directed cycles), the (non 
directed) cycles describe the different possible histories of growth of a cluster with 
respect to different metrics in s. Taking into account all possible subsets of the set 
of metrics s, we generalize the construction of cycles in the graph of clusters to the 
construction of hyperedges in C(X, s). 

The set X of data may be generated in a complex way, in particular, there may 
be some independent contributions. In mathematics independence is described by 
a dimensionality. A hypergraph is a multidimensional generalization of a graph (in 
particular, a product of graphs is a hypergraph). 



11 



The idea of the approach of the present paper is that there should be some 
way to describe independencies in data at the level of graphs (and hypergraphs) of 
clusters. Classification trees (such as trees of clusters) describe the diversity of data, 
the multidimensional generalization proposed in the present paper should describe 
the situation where we have independent sources of diversity. In particular, the 
dimension of an hyperedge will describe the number of sources of diversity (let us 
note that one can use both the A-dimension and the B-dimension of hyperedges to 
discuss this subject). 

One of the applications of classification trees is in bioinformatics. Clustering 
procedures are applied in bioinformatics in order to generate phylogenetic trees (a 
phylogenetic tree is a classification tree which is considered as an inferred evolution- 
ary tree). The metric for the clustering procedure will be equal to the sum of the 
contributions from the different genetic markers 

N 

d(X,Y) = J2^dj(X,Y), (8) 

where Wj > are weights, X and Y are genomes, dj measure the distance between 
the genomes for the j-th genetic marker (some subsequence of a genome). 

Since one may use the different weights Wj for contributions to the classification 
metric d(-, •) from the different genetic markers, the tree of clusters generated in 
this way will be essentially non unique. In particular, taking all weights Wj except 
a single weight to be equal to zero, we obtain the genetic distance measured for a 
fixed genetic marker. 

It was found that clustering (ar analogous procedures of construction of classifica- 
tion trees) applied to the different parts of genomes (and, in general, clusterings with 
the different parameters Wj) may generate different trees. The non- uniqueness of 
phylogenetic trees will be important in the situations where some parts of a genome 
have different origins, e.g. for the cases of reticulate evolution such as hybridiza- 
tion, endosymbiosis or horizontal gene transfer (when some parts of the genome are 
transferred from the different species). It was proposed to use the "forest of life" 
(or "phylogenetic network") instead of the "tree of life" point of view to describe 
such kind of phenomena, see jH] for a review of general applications of networks in 
biology and [2j [3] for a discussion of phylogenetic trees and evolution. For the review 
of mathematical methods for phylogenetic networks one can mention [9] and works 
by A. Dress and coauthors [TUl ITT] . 

Example. Let us consider the metric (jSJ) for the case of two genetic markers with 
the corresponding metrics di(-, •) and o^G, •) and the total metric d = W\d\ + w-idi- 
Assume that each of the two genetic markers may take two possible values which we 
denote by and 1 and the corresponding distance between and 1 will be equal to 
one. We have the four possible variants of a genome (four possible pairs of genetic 
markers) 

A =(0,0), £ = (1,0), C=(0,1), D = (l,l). 

Then, varying the weights W\ and w 2 , we obtain the cluster system (JJJ), (J2J), 
described in the Example 2 of Section 2. This cluster system will have the dimension 
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two which corresponds to the presence of the two genetic markers which can vary 
independently. 

One of the problems which arise in the consideration of phylogenetic networks is 
to construct these networks and to embed trees (obtained, in particular, by clustering 
of genetic sequences) into these graphs. In our approach we can generate graphs 
of clusters with cycles using the introduced hypergraph clustering procedure. The 
embedding of the corresponding phylogenetic trees (obtained by fixing of one metric 
from the family of metrics used for clustering) is obtained automatically. 

In our approach we combine all the trees from the "forest of data" (in particular, 
"forest of life") in a single multidimensional hypergraph structure, a "hypergraph 
of life". Phylogenetic networks describe the diversity of genetic information. The 
application of the hypergraph clustering allows us to investigate the dimensions 
of hyperedges of the phylogenetic hypergraph. These dimensions (A-dimensions 
and B-dimensions) will describe the number of sources of genetic diversity for the 
corresponding parts of a genome. 

6 Appendix: Ultrametric spaces and trees 

In this Section we discuss the clustering procedure and some results in ultrametric 
analysis, which can be found in particular in [6] . A review of some results of p-adic 
mathematical physics can be found in [?]. 

Let us recall the definition of clustering. The clustering procedure generates a 
partially ordered tree of clusters. In this tree vertices are clusters, partial order 
is defined by inclusion of clusters, an edge connects two clusters nested without 
intermediate clusters. The border of this tree is an ultrametric space with the 
ultrametric defined by the chain distance. 

Definition 2 A sequence of points a = Xq, xi, . . . , x n -i, x n = b in a metric space 
(M,p) is called an e-chain connecting two points a and b if p(xk, Xk+i) < £ for all 
< k < n, and some e > 0. // there exists an e-chain connecting a and b then a 
and b are e- connected. 

Let (M, p) be an arbitrary metric space. Then the chain distance d(a, b) between 
a and b is defined by: 



This distance has all the properties of an ultrametric except for the non-degeneracy 
property. In particular it satisfies the strong triangle inequality 



The cluster C(i,R) in a metric space (M,p) is the ball with the center i and 
radius R with respect to the chain distance, i.e. the set {j 6 M: d(i,j) < R}. 



d(a, b) 



mf(e : a, b are e — connected). 




Va, 6, a 



(9) 
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Definition 3 The clustering of the space M is a set of clusters in M such that: 

i) every element in M belongs to some cluster; 

ii) for any pair a, b of elements in M there exists a minimal cluster sup (a, b) 
containing both elements; 

Hi) for arbitrary embedded clusters A C B every increasing sequence of embedded 
clusters {Ai}, A C • ■ • C Ai C • • • C B is finite; 

iv) the total number of clusters in the clustering is finite or countable. 

Example. Let D = {di} be a countable set of positive numbers without positive 
accumulation points. Consider the clustering Cd of the metric space (M, p) which 
contains all clusters of chain radii di G D and arbitrary centers. 

An ultrametric space is a metric space with the metric d(x, y) satisfying the 
strong triangle inequality (jHJ). Ultrametric spaces are dual to trees with some partial 
order. Below we describe some part of the duality construction. 

For a (complete locally compact) ultrametric space X we consider the set T(X), 
which is the result of clustering of X with respect to the ultrametric, i.e. T(X) 
contains all the balls in X of nonzero diameters, and the balls of zero diameter 
which are maximal subbals in balls of nonzero diameters. This set possesses a 
natural structure of a partially ordered tree. The partial order in T{X) is defined 
by inclusion of balls. 

Two vertices / and J in 1~(X) are connected by an edge if the corresponding 
balls are ordered by inclusion, say I D J (i.e. one of the balls contains the other), 
and there are no intermediate balls between I and J. 

On the tree T(X) we have the natural increasing positive function which asso- 
ciates to any vertex the diameter of the corresponding ball. 

Assume now that we have a partially ordered tree T, satisfying the conditions: 

1) Graph T is a tree, i.e. for any pair of vertices there exists a finite path in T 
which connects these vertices and T does not contain cycles. 

2) Each vertex in T is incident to a finite set of edges. 

3) For any finite path in T there exists a unique maximal vertex in this path. 
Let us choose an arbitrary positive increasing (w.r.t. the partial order) function 

F on this tree. Then we define the ultrametric on the set of vertices of the tree T 
as follows: d(I, J) = F(sup(J, J)), (for I ^ J), where sup(J, J) is the supremum of 
vertices /, J with respect to the partial order. The vertex sup(J, J) coincides with 
the above mentioned unique maximal vertex in the path I J. 

Then we take the completion of the set of vertices with respect to the defined 
ultrametric and eliminate from the completion all the inner points of the tree (a 
vertex of the tree is inner if it does not belong to the border of the tree, i.e. it is 
incident to more than one edge). We denote the obtained space by X(T), this space 
is ultrametric, complete and locally compact. The space X(T) is called the border 
of the tree T ■ 
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